26 research outputs found

    Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data

    Full text link
    In this work, we extend the instruction-tuned Llama-2 model with end-to-end general-purpose speech processing and reasoning abilities while maintaining the wide range of LLM capabilities, without using any carefully curated paired data. The proposed model can utilize audio prompts as a replacement for text and sustain a conversation. Such a model also has extended cross-modal capabilities such as being able to perform speech question answering, speech translation, and audio summarization amongst many other closed and open-domain tasks. This is unlike prior approaches in speech, in which LLMs are extended to handle audio for a limited number of pre-designated tasks. Experiments show that our end-to-end approach is on par with or outperforms a cascaded system (speech recognizer + LLM) in terms of modeling the response to a prompt. Furthermore, unlike a cascade, our approach shows the ability to interchange text and audio modalities and utilize the prior context in a conversation to provide better results

    Gendered nationalism : the gender gap in support for the Scottish National Party

    Get PDF
    Recent major surveys of the Scottish electorate and of Scottish National Party (SNP) members have revealed a distinct gender gap in support for the party. Men are markedly more likely than women to vote for the SNP and they comprise more than two-thirds of its membership. In this article, we use data from those surveys to test various possible explanations for the disproportionately male support for the SNP. While popular accounts have focused on the gendered appeal of recent leaders and on the party’s fluctuating efforts at achieving gender equality in its parliamentary representation, we find much stronger support for a different explanation. Women are less inclined to support and to join the SNP because they are markedly less supportive of its central objective of independence for Scotland. Since men and women barely differ in their reported national identities, the origins of this gender gap in support for independence presents a puzzle for further research

    Metropolitan Briefing Book, 2007

    Get PDF
    The Institute of Portland Metropolitan Studies (IMS) was created to connect the resources of higher education to the needs of the six-county, bit-state Portland-Vancouver metropolitan area (Clackamas, Clark, Columbia, Multnomah, Washington, and Yamhill Counties). In this spirit, we offer our 2007 Metropolitan Briefing Book. Our theme is regional variety. Variety has been touted as the very spice of life (William Cowper) and as the mother of enjoyment (Vivan Grey). Our region enjoys a good deal of variety--in its landscapes, in its economy, and in its people, their cultures, and their attitudes. These differences are important to local vitality and beauty. But while we generally view this variety as positive, we also worry about equity. Although we promote regional thought and action, we must understand that each community experiences the problems facing us in a slightly different way and often with significantly different resources

    Prompting Large Language Models with Speech Recognition Abilities

    Full text link
    Large language models have proven themselves highly flexible, able to solve a wide range of generative tasks, such as abstractive summarization and open-ended question answering. In this paper we extend the capabilities of LLMs by directly attaching a small audio encoder allowing it to perform speech recognition. By directly prepending a sequence of audial embeddings to the text token embeddings, the LLM can be converted to an automatic speech recognition (ASR) system, and be used in the exact same manner as its textual counterpart. Experiments on Multilingual LibriSpeech (MLS) show that incorporating a conformer encoder into the open sourced LLaMA-7B allows it to outperform monolingual baselines by 18% and perform multilingual speech recognition despite LLaMA being trained overwhelmingly on English text. Furthermore, we perform ablation studies to investigate whether the LLM can be completely frozen during training to maintain its original capabilities, scaling up the audio encoder, and increasing the audio encoder striding to generate fewer embeddings. The results from these studies show that multilingual ASR is possible even when the LLM is frozen or when strides of almost 1 second are used in the audio encoder opening up the possibility for LLMs to operate on long-form audio

    A summary of the 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition

    Get PDF
    We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.5 page(s

    TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

    Full text link
    Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can be deployed on devices. This can be done by tuning the model's hyperparameters or exploring variations in its architecture. Re-training and re-validating models after making these changes can be a resource-intensive task. This paper presents TODM (Train Once Deploy Many), a new approach to efficiently train many sizes of hardware-friendly on-device ASR models with comparable GPU-hours to that of a single training job. TODM leverages insights from prior work on Supernet, where Recurrent Neural Network Transducer (RNN-T) models share weights within a Supernet. It reduces layer sizes and widths of the Supernet to obtain subnetworks, making them smaller models suitable for all hardware types. We introduce a novel combination of three techniques to improve the outcomes of the TODM Supernet: adaptive dropouts, an in-place Alpha-divergence knowledge distillation, and the use of ScaledAdam optimizer. We validate our approach by comparing Supernet-trained versus individually tuned Multi-Head State Space Model (MH-SSM) RNN-T using LibriSpeech. Results demonstrate that our TODM Supernet either matches or surpasses the performance of manually tuned models by up to a relative of 3% better in word error rate (WER), while efficiently keeping the cost of training many models at a small constant.Comment: Meta AI; Submitted to ICASSP 202

    The First Provenance Challenge

    No full text
    The first Provenance Challenge was set up in order to provide a forum for the community to help understand the capabilities of different provenance systems and the expressiveness of their provenance representations. To this end, a Functional Magnetic Resonance Imaging workflow was defined, which participants had to either simulate or run in order to produce some provenance representation, from which a set of identified queries had to be implemented and executed. Sixteen teams responded to the challenge, and submitted their inputs. In this paper, we present the challenge workflow and queries, and summarise the participants contributions

    A Meta-Regression Analysis to Evaluate the Effects of Narasin on Grow-Finish Pig Performance

    Get PDF
    A meta-regression analysis was conducted to evaluate the effects of added narasin in growing-finishing pig diets to predict the influence on average daily gain (ADG), feed efficiency (G:F), and carcass yield. A database was developed containing 21 technical reports, abstracts, and refereed papers from 2012 to 2021 representing 35 observations for growth performance data in studies ranging from 35 to 116 days in length (overall data). In addition, within these 35 observations, individual period data were evaluated (143 observations) using weekly, bi-weekly, or monthly performance intervals (period data). Regression model equations were developed, and predictor variables were assessed with a stepwise manual forward selection procedure. Important variables in predicting the response to added narasin included ADG, average daily feed intake (ADFI), and G:F of the control pigs, feeding duration (shorter or longer than 65 days) and body weight (greater than or less than 230 lb). Using median values from the database for predictor variables, the meta-analysis indicated narasin would be expected to improve ADG between 1.06 to 1.65%, G:F between 0.71 to 1.71%, and carcass yield by 0.31% when fed for longer than 65 days
    corecore